Data Tables and Plot 1

Column

Column Tab 1

# A tibble: 32,561 × 15
     age workclass   fnlwgt education    education.num marital.status occupation
   <dbl> <chr>        <dbl> <chr>                <dbl> <chr>          <chr>     
 1    90 ?            77053 HS-grad                  9 Widowed        ?         
 2    82 Private     132870 HS-grad                  9 Widowed        Exec-mana…
 3    66 ?           186061 Some-college            10 Widowed        ?         
 4    54 Private     140359 7th-8th                  4 Divorced       Machine-o…
 5    41 Private     264663 Some-college            10 Separated      Prof-spec…
 6    34 Private     216864 HS-grad                  9 Divorced       Other-ser…
 7    38 Private     150601 10th                     6 Separated      Adm-cleri…
 8    74 State-gov    88638 Doctorate               16 Never-married  Prof-spec…
 9    68 Federal-gov 422013 HS-grad                  9 Divorced       Prof-spec…
10    41 Private      70037 Some-college            10 Never-married  Craft-rep…
# ℹ 32,551 more rows
# ℹ 8 more variables: relationship <chr>, race <chr>, sex <chr>,
#   capital.gain <dbl>, capital.loss <dbl>, hours.per.week <dbl>,
#   native.country <chr>, income <chr>

Column Tab 2

Column

Row 1

Row 2

Plots 2 and 3

Column

1. Plot 2

Plot 2 shows the proportions of race by native country. This is helpful because it gives an indication of the demographic of race, and gives insight on which part of the world the individual came from, and the proportions of race in those parts of the world as well.

2. Plot 3

Plot 3 shows the proportions of marital status by sex. One shocking result of this plot is how a large proportion of women in the data set were never married. This can be investigated further to look for underlying cause due to ages of participants, or another underlying cause.

Column

Row 1

Row 2

Plot 4

Column

1. Plot 4

Plot 4 shows the proportions of work class by age. Majority of all age groups are involved in a private work class, especially at a younger age where participants might be working for themselves, or have a small business to make money. Another plot that could go along with this is looking at education level against work class to see if high school education is correlated with more private work class levels, etc.

2. Missing Data

With the plots, I did not remove any indications of missing data marked by the “?” in this plot 4. This is because in this plot the missing data helps with insight on the data set, where missing data regarding work class makes sense because majority of the missing data are for people in the census under 25 years of age, so those participants might not be working yet, or have underlying reasons for not submitting work class information to the adult census.

Column

Row 1

Row 2

---
title: "Adult Census"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    logo: download.png
    source_code: embed
    social: menu
---

```{r}
library(flexdashboard)
library(tidyverse)
library(plotly)
library(knitr)
library(DT)

df <- read_csv('~/Statistical Analysis with R/adult_census.csv')

# Create a ggplot object
plot1 <- df %>% 
  ggplot()+ 
  geom_bar(mapping=aes(y=education, fill=sex), 
           position = 'fill')+
  labs(y="Education Status", fill="Sex")

plot2 <- df %>% 
  ggplot()+ 
  geom_bar(mapping=aes(y=native.country, fill=race), 
           position = 'fill')+
  labs(y="Native Country", fill="Race")

plot3 <- df %>% 
  ggplot()+ 
  geom_bar(mapping=aes(x=sex, fill=marital.status), 
           position = 'fill')+
  labs(x="Sex", fill="Marital Status")

plot4 <- df %>% 
  ggplot()+ 
  geom_bar(mapping=aes(x=age, fill=workclass), 
           position = 'dodge')+
  labs(x="Age", fill="Work Class")
```

{.sidebar}
=======================================================================

### 1. Census Definition

census. noun. cen·​sus. : a usually complete count of a population (as of a state) especially : a periodic governmental count of a population that usually includes social and economic information (as occupations, ages, and incomes)
(Source: https://www.merriam-webster.com/dictionary/census#:~:text=Legal%20Definition-,census,occupations%2C%20ages%2C%20and%20incomes)

### 2. Census Dataset

This census data set consists of 15 variables, and is a mix of both numerical and categorical variables. The target variable is the "income", since the data set is used to predict whether or not the observation's salary has greater than or less than $50k, using the variables and demographics from the adult census data.


### 3. More Information

More information about this dataset can be found at https://archive.ics.uci.edu/dataset/2/adult


Data Tables and Plot 1
=======================================================================

Column {data-width=500, .tabset}
-----------------------------------------------------------------------

### Column Tab 1

```{r}
df
```


### Column Tab 2

```{r}
datatable(df, options = list(
  pageLength = 25
))
```


Column {data-width=500}
-----------------------------------------------------------------------

### Row 1

```{r}
plot1
```

### Row 2

```{r}
ggplotly(plot1)
```


Plots 2 and 3
=======================================================================

Column {data-width=500}
-----------------------------------------------------------------------

#### 1. Plot 2

Plot 2 shows the proportions of race by native country. This is helpful because it gives an indication of the demographic of race, and gives insight on which part of the world the individual came from, and the proportions of race in those parts of the world as well.

#### 2. Plot 3

Plot 3 shows the proportions of marital status by sex. One shocking result of this plot is how a large proportion of women in the data set were never married. This can be investigated further to look for underlying cause due to ages of participants, or another underlying cause.


Column {data-width=500}
-----------------------------------------------------------------------

### Row 1

```{r}
ggplotly(plot2)
```

### Row 2

```{r}
ggplotly(plot3)
```


Plot 4
=======================================================================

Column {data-width=500}
-----------------------------------------------------------------------

#### 1. Plot 4

Plot 4 shows the proportions of work class by age. Majority of all age groups are involved in a private work class, especially at a younger age where participants might be working for themselves, or have a small business to make money. Another plot that could go along with this is looking at education level against work class to see if high school education is correlated with more private work class levels, etc. 

#### 2. Missing Data

With the plots, I did not remove any indications of missing data marked by the "?" in this plot 4. This is because in this plot the missing data helps with insight on the data set, where missing data regarding work class makes sense because majority of the missing data are for people in the census under 25 years of age, so those participants might not be working yet, or have underlying reasons for not submitting work class information to the adult census. 


Column {data-width=500}
-----------------------------------------------------------------------

### Row 1

```{r}
plot4
```

### Row 2

```{r}
ggplotly(plot4)
```